Accessing spoken interaction through dialogue processing
نویسنده
چکیده
Written language is one of our primary means for documenting our lives, achievements, and environment. Our capabilities to record, store and retrieve audio, still pictures, and video are undergoing a revolution and may support, supplement or even replace written documentation. This technology enables us to record information that would otherwise be lost, lower the cost of documentation and enhance high-quality documents with original audiovisual material. The indexing of the audio material is the key technology to realize those benefits. This work presents effective alternatives to keyword based indices which restrict the search space and may in part be calculated with very limited resources. Indexing speech documents can be done at a various levels: Stylistically a document belongs to a certain database which can be determined automatically with high accuracy using very simple features. The resulting factor in search space reduction is in the order of 4-10 while topic classification yielded a factor of 18 in a news domain. Since documents can be very long they need to be segmented into topical regions. A new probabilistic segmentation framework as well as new features (speaker initiative and style) prove to be very effective compared to traditional keyword based methods. At the topical segment level activities (storytelling, discussing, planning, . . .) can be detected using a machine learning approach with limited accuracy; however even human annotators do not annotate them very reliably. A maximum search space reduction factor of 6 is theoretically possible on the databases used. A topical classification of these regions has been attempted on one database, the detection accuracy for that index, however, was very low. At the utterance level dialogue acts such as statements, questions, backchannels (aha, yeah, . . .), etc. are being recognized using a novel discriminatively trained HMM procedure. The procedure can be extended to recognize short sequences such as question/answer pairs, so called dialogue games. Dialog acts and games are useful for building classifiers for speaking style. Similarily a user may remember a certain dialog act sequence and may search for it in a graphical representation. In a study with very pessimistic assumptions users are able to pick one out of four similar and equiprobable meetings correctly with an accuracy ≈ 43% using graphical activity information. Dialogue acts may be useful in this situation as well but the sample size did not allow to draw final conclusions. However the user study fails to show any effect for detailed basic features such as formality or speaker identity.
منابع مشابه
A Concurrent, Distributed, and Incremental Spoken Dialogue Architecture with a First Application to Prosody
Current-day spoken dialogue systems are tedious to interact with (Ward et al. 2005). eir naturalness and (measurable) quality of interaction can be improved through incremental (step-by-step) processing schemes that enable dialogue systems to interact continuously (Baumann 2013). However, incremental models have not yet adequately addressed the challenge of joint decision making and optimizatio...
متن کاملOffice message center - a spoken dialogue system
This paper describes the experience gained from the structuring of a spoken dialogue system and its key components during the design and development of a telephony based office message center, it integrates auto-attendant, email accessing, meeting scheduling capabilities through spoken dialogue interface. A building block dialogue toolkit has been designed based on these experiences and efficie...
متن کاملSpecialized language models using dialogue predictions
This paper analyses language modeling in spoken dialogue systems for accessing a database. The use of several language models obtained by exploiting dialogue predictions gives better results than the use of a single model for the whole dialogue interaction. For this reason several models have been created, each one for a specific system question, such as the request or the confirmation of a par...
متن کاملDesigning a Portable Spoken Dialogue System
Spoken dialogue systems enable the construction of complex applications involving extended, meaningful interactions with users. Building an eeective, generic dialogue system requires techniques and expertise from a number of areas such as natural language, computer-human interaction, and information systems. A key challenge is to design a system through which user-friendly applications can be c...
متن کاملNew Directions in Spoken Dialogue Technology for Pervasive Interfaces
Spoken dialogue technology has emerged over the past decade as a challenging area for researchers in artificial intelligence, speech and language processing, and humancomputer interaction. At the same time a number of leading players in the computing industry have been looking seriously at the commercial potential of interactive spoken and multimodal systems. This paper considers the challenges...
متن کاملAn Architecture for Dialogue Management, Context Tracking, and Pragmatic Adaptation in Spoken Dialogue Systems
This paper details a software architecture for discourse processing in spoken dialogue systems, where the three component tasks of discourse processing are (1) Dialogue Management, (2) Context Tracking, and (3) Pragmatic Adaptation. We define these three component tasks and describe their roles in a complex, near-future scenario in which multiple humans interact with each other and with compute...
متن کامل